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Abstract 

Consider a problem of forward error-correction for the additive white Gaussian noise (AWGN) channel. For finite 
blocklength codes the backoff from the channel capacity is inversely proportional to the square root of the blocklength. 
In this paper it is shown that codes achieving this tradeoff must necessarilly have peak-to-average power ratio (PAPR) 
proportional to logarithm of the blocklength. This is extended to codes approaching capacity slower, and to PAPR 
measured at the output of an OFDM modulator. As a by-product the convergence of (Smith's) amplitude-constrained 
AWGN capacity to Shannon's classical formula is characterized in the regime of large amplitudes. This converse-type 
result builds upon recent contributions in the study of empirical output distributions of good channel codes. 

Index Terms 

Shannon theory, channel coding, Gaussian channels, peak-to-average power ratio, converse 

I. Introduction 

In the additive white Gaussian noise (AWGN) communication channel a (Nyquist-sampled) waveform x n = 
(xi, . . . ,x n ) € K™ experiences an additive degradation: 

Yj = Xj + Zj , Zj ~ Af(0, 1) (1) 

where Y n = (Yi, . . . , Y n ) represent a (Nyquist-sampled) received signal. An (n, M, e, P) error-correcting code is 
a pair of maps / : {1, . . . , M} -> M" and g : R™ ->• {1, . . . , M} such that 

P[W ^W]<e, 

where W € {1, . . . , M} is a uniformly distributed message, and 

X n = f(W) (2) 
W = g(Y n )=g(f(W) + Z n ), (3) 
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are the (encoded) channel input and the decoder's output, respectively. The channel input is required to satisfy the 
power constraint 



xn h = I y",\ x .i\ 2 I <VnP. (4) 




The non-asymptotic fundamental limit of information transmission over the AWGN channel is given by 



M* (n, e, P) = max{M : 3{n, M, e, P)-code} . 



It is known that [l] 1 



log M*(n,e,P) = nC{P) - y/ nV {P)Q~ X (e) + O(logn) , (6) 



where the capacity C(P) and the dispersion V(P) are given by 

C(P) = ±log(l + P), (7) 
log e P(P + 2) 

V(P) —(P + l)" ' (8) 

The peak-to-average power ratio (PAPR) of x n is defined as 

a lla; n ll 2 
PAPR(x") = "„ , 

where ||x n ||oo = max j=i...n This definition of PAPR corresponds to the case when the actual continuous time 
waveform is produced from x n via pulse-shaping (and heterodyning). If a low-pass filtering is employed instead, 
the maximal amplitude of the signal may be attained in between Nyquist samples, and thus the PAPR observed by 
the high-power amplifier may be even larger. 

In this paper we address the following question: What are the PAPR requirements of codes that attain or come 
reasonably close to attaining the performance of the best possible codes (6)? In other words, we need to assess the 
penalty on log M* introduced by imposing, in addition to (4), an amplitude constraint: 

ipnu<^ n , (9) 

where A n is a certain sequence. If A n is fixed, then even the capacity term in (6) changes according to a well-known 
result of Smith [2]. Here, thus, we focus on the case of growing A n . 

Previously, we have shown, [3, Theorem 6] and [4], that very good codes for AWGN automatically satisfy 



A n = 0(y/\ogn). Namely any code with 

log M >nC- v / nV{P)Q- 1 {e) -jlogn, (10) 

'As usual, all logarithms log and exponents cxp are taken to an arbitrary fixed base, which also specifies the information units. Q _1 is the 
inverse of the standard Q -function: 

r°° e~ y2 

Q(x)= / -;=dy. (5) 

J X 
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has at least 4^ codewords with 



IMloo <7V5in. 



In other words, very good codes cannot have PAPR worse than O(logn). On the other hand, for capacity-achieving 
input X" ~ Af(0, P), classical results from extremal value theory shows that the peak amplitude behaves with high 



probability according to H-X^U = \J^P log n + Op(l) [5]. Therefore it is reasonable to expect that good codes 
must also have peak amplitude scaling as \/2 logn. Indeed, in this paper we show that, even under much weaker 
assumptions on coding performance than (10), the PAPR can not be of a smaller order than logn. 

This result is to be contrasted with recent investigations of PAPR in orthogonal frequency division multiplexing 
(OFDM) modulation. Given x n E C™ the baseband OFDM (with n subcarriers) signal Sb(i) is given by 

* 6 (t)=4= y> e2,ri " ' 

whereas the transmitted signal is 

s(t) = Rc (e^^Sbit)) , 0<t<n (11) 
where f c is the carrier frequency. For large f c , we have that PAPR of s(t) may be approximated as [6, Chapter 5] 
OFDM PAPRfr'M ^ max «e[0>"] „ max «£[0,"] \ s b(t)\ 2 

where the quantity on the rightmost-hand side is known as the peak-to-mean envelope power (PMEPR). Note that 
values of Sb(-) at integer times simply represent the discrete Fourier transform (DFT) of x n . Thus PMEPR is always 
lower bounded by 

\\Fx n \\ 2 

PMEPR(x") ^ Tjj^Tjf • < 12 > 

where F is the n x n unitary DFT matrix 

p _ 1 2tHM 

tk,i — — i=e " ■ 
V n 

In view of (12), it is natural to also consider the case where the amplitude constraint (9) is replaced with 

||tfx"||oo < An, (13) 

where U is some fixed orthogonal (or unitary) matrix. Note that for large n there exist some ("atypical") x € C™ 
such that the lower bound (12) is very non-tight [6, Chapter 4.1]. Thus, the constraint (13) with U — F is weaker 
than constraining inputs to those with small OFDM-PAPR(x n ). Nevertheless, it will be shown even with this 
relaxation A n is required to be of order logn. 

The question of constellations in C™ with good minimum distance properties and small OFDM-PAPR has been 
addressed in [7]. In particular, it was shown that the (Euclidean) Gilbert- Varshamov bound is achievable with codes 
whose OFDM-PAPR is O(logn). When x n - Af c (0,P) n , the resulting distribution of OFDM-PAPR was analyzed 
in [8]. For so distributed x n as well as x n chosen uniformly on the sphere, OFDM-PAPR tightly concentrates 
around logn, cf. [6, Chapter 6]. Similarly, if the components of x n are independently and equiprobably sampled 
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from the ill-QAM or M-PSK constellations OFDM-PAPR again sharply peaks around logn, cf. [9]. If x n is an 
element of a BPSK modulated BCH code, then again OFDM-PAPR is around logn for most codewords [6], [9]. 

II. Main results 

We start from a simple observation that achieving capacity (without stronger requirements like (10)) is possible 
with arbitrarily slowly growing PAPR: 

Proposition 1: Let A n — >■ oo. Then for any e 6 (0, 1) there exist a sequence of (n, M n , e, P) codes satisfying (9) 
such that 

-l0gM„ ->C(P). 

n 

Proof: Indeed, as is well known, e.g. [10, Chapter 10], selecting M n = exp{nC(P) + o(n)} codewords with 
i.i.d. Gaussian entries Xj ~ Af(0, P) results (with high probability) in a codebook that has vanishing probability 
of error under maximum likelihood decoding. Let us now additionally remove all codewords violating (9). This 
results in a codebook with a random number M' n < M n of codewords. However, we have 

E[M' n ] = M n n\X n \\ x < A n ] (14) 

= M„(l-2Q(4^n (15) 



'P. . 

= M n ■ exp{o(n)} = exp{nC(P) + o(n)} . (16) 

The usual random coding argument then shows that there must exist a realization of the codebook that simultaneously 
has small probability of error and number of codewords no smaller than -| E[A/ r 'J. ■ 
Remark 1: Clearly, by applying U^ 1 first and using the invariance of the distribution of noise Z n to rotations 
we can also prove that there exist capacity-achieving codes satisfying "post-rotation" amplitude constraint (13). A 
more delicate question is whether there exist good codes with small PMEPR (which approximates OFDM-PAPR). 
In that regard, [8] and [6, Chapter 5.3] show that if X n - CAf(0, PI n ) we have 

P[PMEPR(X") < Al] w e -V% nA " e ~ Al . 

Thus, repeating the expurgation argument in (16) we can show that there exists codes with arbitrarily slowly growing 
OFDM-PAPR and achieving capacity. Furthermore, there exist codes achieving expansion in (6) to within 0(\/n) 
terms with OFDM-PAPR of order log n. 

From Proposition 1 it is evident that the question of minimal allowable PAPR is only meaningful for good codes, 
i.e. ones that attain log M*(n,e,P) to within, say, terms of order 0(n a ). The following lower bound is the main 
result of this note: 

Theorem 2: Consider an (n, M, e, P) code for the AWGN channel with e < 1/2 

logM > nC{P) -7n Q (17) 

for some a £ [1/2, 1) and 7 > 0. Define 



6 a , P = (l-a){VTTP-l) 2 . (18) 
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Then for any S < S a> p, there exists an Nq = Nq(<x, P, 6, 7, e), such that if n > No, then for any n x ?i orthogonal 
matrix U at least 4j- codewords satisfy 

||£^1oo> V2^1ogn. (19) 



Remark 2: The function a n- 5 a p suggests there exists a tradeoff between the convergence speed and the peak 
amplitude for a fixed average power budget P. Choosing U to be the identity matrix, Theorem 2 implies that any 
sequence of codes with rate C(P) — Oln^^^) needs to have PAPR at least 



2a a ,P, 2(l-a)( v /TTP-l) 2 1 

— p-^ogn = logn. 

1 * 1 p 1 

In particular for a = ^, note that < ^ for P > 0. On the other hand, X™ independently drawn from the 

optimal input distribution Af(0,P) has PAPR 21ogn(l + o(l)) with high probabilty regardless of P. It is unclear 
what the optimal a-6 tradeoff is or whether it depends on the average power P. 

Proof: We start with a few simple reductions of the problem. First, any code {ci, . . . , Cm } C R™ can be 
rotated to {C/^ 1 ci, . . . , U~ 1 cm} without affecting the probability of error. Hence, it is enough to show (19) with 
U — I n , the 71 x 71 identity matrix. Second, by taking some e' > e and reducing the number of codewords from 
M to M' = c t M we may further assume that the resulting (n, A/', e') subcode has small maximal probability of 
error, i.e. 

P[W ^ i\W = i] < e' , i e {1,...,M}. 

Note that by Markov's inequality, c e > 1 — 4. Since e < 1/2 we may have c c > 1/2 by choosing e' 6 (2e, 1). 
Third, if a resulting code contains less than 4^ codewords satisfying (19), then by removing those codewords we 
obtain an (71, M", e', P) code such that 

log M' > nC{P) - 7 7i Q - log (c t - M = nC(P) - j'n a . 

Thus, overall by replacing 7 with 7', M with M" and e with e' it is sufficient to prove: Any (71, M, e, P) code 
with maximal probability of error e satisfying (17) must have at least one codeword such that 



IMU > y/^Slogn, (20) 

provided n > Nq for some Nq E N depending only on (a, e, P, 7, S). We proceed to showing the latter statement. 

In [4, Theorem 7] (see also [11]) it was shown that for any (n, M, e, P) code with maximal probability of error 
e we have 

£>(iV»||Pyn) < nC(P) - log Af + aV^, 

where a > is some constant depending only on (e, P), Py„ = A/"(0, 1 + P) n and Py,. is the distribution induced 
at the output of the channel (1) by the uniform message W € {1, . . . ,M}. In the conditions of the theorem we 
have then 

D(P y *\\Py~) < in a + a^i<in a , (21) 
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where 7' can be chosen to be 7 + a. 

Next we lower bound P/(Pyn||Py„) by solving the following /-projection problem: 



Un(A) = inf D(Py n \\Af(0, 1 + P)"), 



(22) 



where Pyn ranges over the following convex set of distributions: 

P Yn = P x „ *JV(0,l) n , ^"[||^ n ||oo < A] = 1. 
Since the reference measure in (22) is of product type and D^Pjj™ || n"=i — S™=i ^(-^JIQfi)' we nave 

it„(A) =nu l (A). (23) 
To lower bound ui(A), we use the Pinsker inequality 



D{P\\Q) >21ogeTV 2 (P,Q), 



(24) 



where the total variation distance is defined by TV(P, Q) = sup B |P(P) — Q(P)| with E ranging over all Borel 
sets. Next we lower bound TV(Py 1 , A/"(0, 1 + P)) in a similar manner as in [12, Section VI-B]. To this end, let 
Y{ ~ Af(0, 1 + P). Fix r > Vl+ 1 p _ 1 . Since P [\Xi\ < A] = 1, applying union bound yields 



Yi| > ?Vl + PA <P |Zi| > A(rVl + P - 1) 



2Q(A(rVT+P-l)). 



On the other hand, 



Assembling (25) and (26) gives 



y x *| > rVTTPA = 2Q(rA). 



TV(P Yl ,N(0,l + P)) > Q(rA) - 2Q(A(rVl + P - 1)). 
Combining (24) and (27), we have 

ui(A) > (Q{rA) - Q{{r^/TTP - l)A)f %\oge 

r with r > 0. Note that for all x > 0, 



Suppose that A„ = HX"^ < V251ogn. Let r 



O+P-l 



-^4<Q(x)<^ 
1 + x- X 

where (p(x) = -j=e~ x I 2 is the standard normal density. Assembling (21), (22), (23) and (28), we have 
in*- 1 > (Q(ry/2S log n) - Q((rVTTT -l)y/2S log n)) 2 8 log e 



-8r 2 



> a- 



\/\ogn 

for all n > Nq, where c\ and Nq only depend on P and r. Hence 

C2 log log n 
log n 



<5 > 



1 - a 



(25) 



(26) 



(27) 



(28) 



(29) 



(30) 
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for some constant only depends on P and r. By the arbitrariness of r, we complete the proof of (20). ■ 
Theorem 3: Any (n, M, e, P) code with maximal probability of error e must contain a codeword x n such that 

||.x' l ||oo>A (31) 

where A is determined as the solution to 



( Q(r*A) - Q((r* VTTP - I) A)) \\oge = C-- log M + J 6 t 3 + 4P ) log e + - log ■ ' 
V / n V n 



n l — e 



where 

, y/A 2 + P \og(P + 1) + A^/PTT 



(32) 



AP 

Remark 3 (Numerical evaluation): Consider SNR=20 dB (P = 100), e = 10~ 3 and blocklength n = 10 4 . Then, 
any code achieving 95%, 99% and 99.9% of the capacity is required to have PAPR — 1.2 dB (trivial bound), 1.99 dB 
and 3.85 dB, respectively. 

Proof: The proof in [4] actually shows 

2 



D{P Yn | \P Yn ) <nC- log M + VM3 + 4P) log e + log ■ 



1 - e 

Let A n = ||a;' l ||oo- Using D(Py™\\P y ,i) > nu\{A n ) and the lower bound on u\{A) in (28), we obtain the result 
after noticing that the right-hand side of (28) is maximized by choosing r as in (32). ■ 

III. Amplitude-constrained AWGN capacity 

As an aside of the result in the previous section, we investigate the following question: How fast does the 
amplitude-constrained AWGN capacity converges to the classical AWGN capacity when the amplitude constraint 
grows? To this end, let us define 

C{A,P)= sup I{X;X + Z) (33) 

e[x 2 ]<p 

\X\<A a.s. 

This quantity was first studied by Smith [2], who proved the following: For all A, P > 0, C(A, P) < C(oo, P) = 
\ log(l + P). Moreover, the maximizer of (33), being clearly non-Gaussian, is in fact finitely supported. Little is 
known about the cardinality or the peak amplitude of the optimal input. Algorithmic progress has been made in [13] 
where an iterative procedure for computing the capacity-achieving input distribution for (33) based on cutting-plane 
methods is proposed. On the other hand, the lower semi-continuity of mutual information immediately implies that 
C(A,P) ilog(l + P) as A — > oo. A natural ensuing question is the speed of convergence. The next result 
shows that the backoff to Gaussian capacity due to amplitude constraint vanishes at the same speed as the Gaussain 
tail. 
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Theorem 4: For any A, P > 0, 



81oge(VTTP- l) 2 log 2 (l + P) 2 / y/(l + P) (A 2 + Plog(l + Pjj + A \ 
(A+^4 2 + Plog(l + P)) 2 ^ ^ P J 

< ilog(l + P) -C(A,P) (34) 



- — 

Consequently, 

exp (- ( ^_^ — + 0(log A)j < l - log(l + P) - C(A, P) < cxp \^~ + OQog A)^j , A -+ oo. (36) 

Proof: The lower bound follows from the proof of Theorem 2 by noting that for any X such that K[X] = 0, 
E[X 2 } < P and \X\ < A, 

i log(l + P) - I(X- X + Z)>± log(l + E[X 2 ]) - I(X; X + Z) 

= D(p x+z \\M(o,i+nx 2 ])) 

> D{P x+z \\M(0A + P)) (37) 

> ui{A) (38) 



> (Q(r*A)-Q((r*VT+P-l)A)) 81oge, (39) 

where (37) follows from the fact that inf s>0 D(P Y \ \Af(0,s)) = D(P Y || 7V(0, E[y 2 ])) for all zero-mean Y, while 
(38) and (39) follow from (22) and (28) with r = r* as in (32), respectively. We can then further lower bound (39) 

by 8 log eip (b){b — a) , where b = — p > a = — p . The proof of (34) 

is completed upon noticing that b — a = iog(i+P) 

F F 6 A+y/A 2 +Plog(l + P) 

To prove the upper bound, we use the following input distribution: Let X* ~ Af(0, P). Let Xa and Xa be 
distributed according to X* conditioned on the event < A and |X*| > A, i.e., P [Xa £ •] = ^^x^'^-A^A^ 1 
Then E [X^] = P fl — ^^X) j < P. Moreover, in view of (29), we have 

By the concavity of mutual information in the input distribution, we have ^ log(l+P) < /(Xa; Xa+Z)P [|X* | < A]- 
I(X A ;X A + Z)F[\X*\ > A], hence 



| log(l + P) - Q(^=) log ( 1 + P + } 



completing the proof of (35). 
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